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^^ I Abstract. We prove that HITS, to "get right" h of the top k ranked 

^^ ■ nodes of an N > 2k node graph, can require h^'-^'k^ iterations (i.e. a sub- 

^^ ' stantial fj(jv ^i°g^ ) matrix multiphcations even with a "squaring trick"). 

Our proof requires no algebraic tools and is entirely self-contained. 
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^' 1 HITS 

\—\ • Kleinberg's celebrated HITS algorithm [11] ranks the nodes of a generic graph in 

t/3 , order of "importance" based solely on the graph's topology. Originally proposed 

to rank web pages in order of authority (and still the basis of some search engines 

such as Ask [2]), it has been adapted to many different application domains, 

such as topic distillation [5], word stemming [5], automatic synonym extraction 

^ , in a dictionary [4j, item selection |17| . and author ranking in question answer 

portals 9 (to name just a few - see also [131 El flSllTO]). 
^^ [ The original version of HITS works as follows. In response to a query, a 

ff^ ' search engine retrieves a set of nodes of the web graph on the basis of pure 

textual analysis; for each such node it also retrieves all nodes pointed by it, and 
f^ ' up to d nodes pointing to it. Then HITS associates an authority score a^ (as 

00 , well as a hub score hi) to each node Vi of this base set, and iteratively updates 

^D • these scores according to the formulas: 
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/^f ) = 1 af = Y. ^i ^ - E «f^ fc = 1, 2, . . . (1) 

Vj — >Vi Vi — yVj 

where u — > u denotes that v points to u. At each step the authority and hub 
vector of scores are normalized in || • ||2. 

Intuitively, HITS places a pebble on each node of the base set graph. At 
odd timesteps, each pebble on node v sires a pebble on every node u such that 
V —^ u, and at even timesteps each pebble on node v sires a pebble on every node 
u such that u — > i; (a pebble is removed upon siring its children). Then, without 
normalization, al equals the number of pebbles on Vi at time 2k — 1 and h\ 
that at time 2k. 
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2 Convergence in Score vs. Convergence in Rank 

HITS essentially computes a dominant eigenvector of A^A, where A is the adja- 
cency matrix of the base set, using the power method [7 - thus, the convergence 
rate of the hub and authority score vectors are well known fTj. Nevertheless, 
what is often really important 16J is the time taken by HITS to converge in 
rank: intuitively after how many iterations nodes no longer change their rela- 
tive rank. A formalization of this intuition is more challenging than it might 
appear [16]. For the purposes of this paper we define convergence in rank as 
follows: 

Definition 1 Consider an iterative algorithm ALG providing at every iteration 
t >0 a score vector v* = [vl, . . . , w^] for the N nodes wi , . . . , uat of a graph; and 
let the set of the (weakly) top k nodes at step t bcT^ = {vi : \{vj : w* > u*}| < fc}. 
Then ALG converges on h of the top k ranks in t steps if \ Ht^r Tl\ > h. 

In other words, an algorithm converges on h of the top k ranks in r steps if 
after r steps it already "gets right" at least h of the k (eventually) top ranked ele- 
ments. This definition is closely related to that of convergence in the intersection 
metric [HI US] for the top k positions; [16j provides a more thorough discussion of 
its relationship to other popular metrics such as Kendall's t, Cramer-von Mises' 
W^, or Kolmogorov-Smirnov's D. 

We prove the first non-trivial lower bound on the iterations HITS requires to 
converge in rank. All previous rank convergence studies save |16j are experimen- 
tal, and none investigates HITS (focusing instead on PageRank p iT^fTB] ). 

3 HITS Can Converge Slowly in Rank 

Informally, we prove that HITS on an N node graph can take /i^(^fc) steps to 
"get right" h of the top k elements. This effectively means fl{ — r^~) matrix 
multiplications using the standard "squaring trick" that computes the p*'' power 
of a matrix M by first computing the matrices M^, M'', . . . ,M^ . More 

formally, we devote the rest of this section to the proof of: 

Theorem 1 For all h and k such that k > h > 5, and all odd n > max(3, fe^^+^ )^ 
there is an undirected graph Th^k^n of N — [rEf 1 (2^ + h — 3) + 1 w %« ^_ /^ y^f. 
tices on which HITS requires more than t = — 4e (•^^T^)~^ — /i^(") steps to 
converge on h of the top k ranks (and the last term is h^^^~' for N > 2k). 

^h,k,n (Fig- H]) is formed by a subgraph Tm,n (with to = /i — 3) and i = 
[^^3ji] isomorphic subgraphs r^„ „, . . . ,rf„ „. fm,n has 2?! + m + 1 vertices, 
V-nj . ■ . ,vo, . . . , Vn+m- The 2n + 1 vertices w_„, . . . ,vq, . . . ,Vn form a chain, with 
Vi connected to Vi-i. The first and last vertices of the chain, w_„ and w„, are also 
connected to each of the to. vertices Wn+i, • ■ • , Vn+m- F^ „ has 2n + m vertices, 
u_„, . . . , M_i, ui, . . . , u„+TO, and is almost isomorphic to r^ „: w,j is connected to 
Uj if and only if Vi is connected to Vj. The only difference is that ug is missing. 






Figure 1: The graph Vh^k.n is formed by the subgraph F^.n (first left) and £ subgraphs 
isomorphic to the subgraph r^_,j (second left), with m ~ h and £ ~ j^^. 



The proof of the theorem proceeds as follows. After introducing some notation, 
Lemma [1] bounds the growth rate of the number of pebbles on Vi and Ui as a func- 
tion of i. Lemma [Hallows us prove, in Lemma [2] that eventually only a vanishing 
fraction of all pebbles resides outside Tm.m and in Lemma[3]that w„+i acquires 
pebbles only minimally faster than Un+i- We then prove the theorem showing 
that Tm,n eventually holds all the top k nodes, but m„+i, . . . , Un+m and the corre- 



sponding nodes in Tf, 



j^m.n ior t < t still outrank ii^-i, 



,vo, 



■ , V-^n+l 



(and thus at least im > k — h oi the top k nodes lie outside Tm,n)- 

Denote by v*" the number of descendants at time i of a pebble present at 
time on w - which is also equal to the number of pebbles present on v after a 
total of t timesteps, since both quantities are described by the recursive equation 

Also, mark with a timestamp r any pebble present at time r on vq and any 
pebble not on vg whose most recent ancestor on vq was present at time r. Note 
that the number of unmarked pebbles present at any given time on Vi (for any 
i) is equal to the total number of pebbles present at that time on Ui ; and, more 
generally, it is straightforward to verify by induction on i — r that any pebble 
present on a vertex m at time r has, on any vertex Uj and at any time i > r, a 
number of descendants equal to the number of descendants not marked after r 
that any pebble present on a vertex Vi at time t has on Vj at time t. 



Lemma 1 For any t>Q, and any i,j such that (i = j) mod 2 and < i < j < 

< -^-r- < TO + 1 and similarly (if i > 0) I < — h— < 



n + 1, we have 1 < -H 



< TO+ 1. 



Proof. We prove that 1 < -4— < -^-r- < m + 1 by induction on t. The base case 
i = is easily verified. -^ — ''' — ^^—rrr is a weighted average (with positive 
weights) of all ratios —firr- By inductive hypothesis, 1 < niini,.,^i,; -f^^ < -^ < 



71* V* V* vK v' + ^ t)*, 

max^.,^t,. -E^ = -^ < ^TTT = min„.,^„. -j^ < -^-r- < max^.^t,. -j^ < 

"l' "• -u', '- I)',, — V ; Vjl^Uj t 1 _ ^t _ V^l^Uj t 1 _ 

„t+l „* + ! 

m + 1. The proof that 1 < -4- < -t— < m + 1 is identical. D 
Lemma 2 Vz > hm (u*/f*) = 0. 

t — >oo 

Proof. For aU i > consider an unmarked pebble pi present at time t on ver- 
tex Vi. At some time t + t, with r < 2n + 2, w„+i holds at least one marked 
descendant p'^ oi pi; by virtue of Lemma [Tl p'^ thereafter always has at least 
as many descendants as any of the other at most {m + ly descendants of pi 
present at time t + r. Then, every 2n-|-2 timesteps, the fraction of unmarked de- 
scendants of an unmarked pebble drops by a factor at least 1 — (?7i+ 1)^(^"+^'. D 

Lemma 3 4SK4^ < 1 + "+^ f" ■ 

<+i/"Ui ~ " <^ 

Proof. Denote by D*^ the number of descendants at time t of a pebble ini- 
tially on Vn+i whose timestamp is r, and with D*^ the number of those de- 
scendants yet unmarked. Since all pebbles whose timestamp is r descend from 
pebbles present in vq at time r, Lemma [1] guarantees that the growth rates 



^_^^.^^. ^i'+i n'+i _ . . . . _r.1 , _. „. v*+\ 



of -D^ , satisfy —f^ > —^ for any r for which DV ^ 0. Thus, -^^^^^ 



D^-I-D'H h-Dt* ^ D*,+_D*H h-DJ - Df, ^ w^^^ ~ ^I^ "•" m^^F^' '^^^^ 

therefore ^S^^ < 1 + ^4^ ■ 4^ < 1 + ^4^ < 1 + ™±i^. D 

We can now prove Theorem [1] By Lemma [51 lim (w^X^iwrr ^*) — Wu ^ 

t — >(yO h.k.n 

fm,n; whereas Vt, < i < n-l-1, by Lemma[T]u| > („,+2«-nKr«+i)''+i S«Gf„,„ ^*- 
Thus, eventually the top (m -I- 2ri -I- 1) > fc ranked nodes all belong to Tm,n- 
We complete the proof showing that, forn — l<t<i= — "L (t)^^ ~ 

m^^ \ we have ^^^^^ — ^^^^^ < | and thus by Lemma [1] at least Hm elements 

"n+i ° 

_ 3 

outside Tra,n are among the top k ranked nodes. Note that -^-^ — T^^if; that 



4m+4 ' 



ll - m+1 ■ cr.r\ <->,=,<- J!n_ <- ("+!)".. - "i+i 



^ = 2m^+3m+3 ' ^^^ that ^ < ^"^^p' - ^. Then, by Lemma [1 
maxK „ ^^-.) < <i^ . ^ax(4-i, ^ . ^) < 4ii . I for m > 3. All is left 
to prove is that -p^ < ? for n — 1 < t < t. 

We first prove that, for ri — 1 < t < t, -^j^^— < e(^)~2- . It is straightforward 
that vi;-^ = 2""i and u^^^J = t;;'-J > (2m)T^. Then, Vt such that ;^ < 
e(A)-Ti ^ 3 ghave ^£y^ = ^^^.^^ < l.(l+ (^^^f ) < 

"^+l/'"i+l '"!, + i/'"n+i "l+l/"^ + l ~ m 4r/ — 

1 -I- i and it takes at least f timesteps for "° to grow by a factor e to e(— ) ~2- . 

ThiT; ''"+1 - ^"" + 1 . n*"l < + i/<+i < 1 . n 4- MTln(7/6) < 7 



4 Conclusions and Open Problems 

This paper presents a self-contained proof that HITS might require /i^^^fc^ it- 
erations to "get right" h of the top k nodes of an iV > 2k node graph. This 
translates into ^{N — rr— ) matrix multiplications even using a "squaring trick"- 
a substantial load when HITS must be used on-line on large graphs (e.g. in web 
search engines) . 

We conjecture that g^^^k^ is a tight worst case bound on the iterations 
required by HITS to converge in rank on h of the top k ranked nodes of an A^ > 2k 
node graph of maximum degree g. This is slightly more (for h subpolynomial in 
N) than the lower bound presented here. 
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